Towards Resolving Unidentifiability in Inverse Reinforcement Learning

نویسندگان

  • Kareem Amin
  • Satinder P. Singh
چکیده

We consider a setting for Inverse Reinforcement Learning (IRL) where the learner is extended with the ability to actively select multiple environments, observing an agent’s behavior on each environment. We first demonstrate that if the learner can experiment with any transition dynamic on some fixed set of states and actions, then there exists an algorithm that reconstructs the agent’s reward function to the fullest extent theoretically possible, and that requires only a small (logarithmic) number of experiments. We contrast this result to what is known about IRL in single fixed environments, namely that the true reward function is fundamentally unidentifiable. We then extend this setting to the more realistic case where the learner may not select any transition dynamic, but rather is restricted to some fixed set of environments that it may try. We connect the problem of maximizing the information derived from experiments to active submodular function maximization, and demonstrate that a greedy algorithm is near optimal (up to logarithmic factors). Finally, we empirically validate our algorithm on an environment inspired by behavioral psychology.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Iterative Learning without Reinforcement or Reward for Multijoint Movements: A Revisit of Bernstein's DOF Problem on Dexterity

A robot designed to mimic a human becomes kinematically redundant and its total degrees of freedom becomes larger than the number of physical variables required for describing a given task. Kinematic redundancy may contribute to enhancement of dexterity and versatility but it incurs a problem of ill-posedness of inverse kinematics from the task space to the joint space. This ill-posedness was o...

متن کامل

Survey of effective factors on learning motivation of clinical students and suggesting the appropriate methods for reinforcement the learning motivation from the viewpoints of nursing and midwifery faculty, Tabriz University of Medical Sciences 2002.

Introduction. Motives are the powerful force in process of education– learning, so that the richest and best training plans and structured education are not effective if the lack of motivation existed. In spite of the fact that the success of teacher depends on the learning motivation of students, then it is necessary for teachers to know the effective methods for motivating the students and t...

متن کامل

Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning

In the field of reinforcement learning there has been recent progress towards safety and high-confidence bounds on policy performance. However, to our knowledge, no methods exist for determining high-confidence safety bounds for a given evaluation policy in the inverse reinforcement learning setting—where the true reward function is unknown and only samples of expert behavior are given. We prop...

متن کامل

Towards Behavior-Aware Model Learning from Human-Generated Trajectories

Inverse reinforcement learning algorithms recover an unknown reward function for a Markov decision process, based on observations of user behaviors that optimize this reward function. Here we consider the complementary problem of learning the unknown transition dynamics of an MDP based on such observations. We describe the behavior-aware modeling (BAM) algorithm, which learns models of transiti...

متن کامل

Preference elicitation and inverse reinforcement learning

We state the problem of inverse reinforcement learning in terms of preference elicitation, resulting in a principled (Bayesian) statistical formulation. This generalises previous work on Bayesian inverse reinforcement learning and allows us to obtain a posterior distribution on the agent’s preferences, policy and optionally, the obtained reward sequence, from observations. We examine the relati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1601.06569  شماره 

صفحات  -

تاریخ انتشار 2016